Pesquisa | Portal Regional da BVS

1.

A novel code representation for detecting Java code clones using high-level and abstract compiled code representations.

Quradaa, Fahmi H; Shahzad, Sara; Saeed, Rashad; Sufyan, Mubarak M.

PLoS One ; 19(5): e0302333, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38728285

RESUMO

In software development, it's common to reuse existing source code by copying and pasting, resulting in the proliferation of numerous code clones-similar or identical code fragments-that detrimentally affect software quality and maintainability. Although several techniques for code clone detection exist, many encounter challenges in effectively identifying semantic clones due to their inability to extract syntax and semantics information. Fewer techniques leverage low-level source code representations like bytecode or assembly for clone detection. This work introduces a novel code representation for identifying syntactic and semantic clones in Java source code. It integrates high-level features extracted from the Abstract Syntax Tree with low-level features derived from intermediate representations generated by static analysis tools, like the Soot framework. Leveraging this combined representation, fifteen machine-learning models are trained to effectively detect code clones. Evaluation on a large dataset demonstrates the models' efficacy in accurately identifying semantic clones. Among these classifiers, ensemble classifiers, such as the LightGBM classifier, exhibit exceptional accuracy. Linearly combining features enhances the effectiveness of the models compared to multiplication and distance combination techniques. The experimental findings indicate that the proposed method can outperform the current clone detection techniques in detecting semantic clones.

Assuntos

Semântica , Software , Linguagens de Programação , Aprendizado de Máquina , Algoritmos

2.

Exploring memory synchronization and performance considerations for FPGA platform using the high-abstracted OpenCL framework: Benchmarks development and analysis.

Almomany, Abedalmuhdi; Jarrah, Amin; Sutcu, Muhammed.

PLoS One ; 19(5): e0301720, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38739583

RESUMO

A key benefit of the Open Computing Language (OpenCL) software framework is its capability to operate across diverse architectures. Field programmable gate arrays (FPGAs) are a high-speed computing architecture used for computation acceleration. This study investigates the impact of memory access time on overall performance in general FPGA computing environments through the creation of eight benchmarks within the OpenCL framework. The developed benchmarks capture a range of memory access behaviors, and they play a crucial role in assessing the performance of spinning and sleeping on FPGA-based architectures. The results obtained guide the formulation of new implementations and contribute to defining an abstraction of FPGAs. This abstraction is then utilized to create tailored implementations of primitives that are well-suited for this platform. While other research endeavors concentrate on creating benchmarks with the Compute Unified Device Architecture (CUDA) to scrutinize the memory systems across diverse GPU architectures and propose recommendations for future generations of GPU computation platforms, this study delves into the memory system analysis for the broader FPGA computing platform. It achieves this by employing the highly abstracted OpenCL framework, exploring various data workload characteristics, and experimentally delineating the appropriate implementation of primitives that can seamlessly integrate into a design tailored for the FPGA computing platform. Additionally, the results underscore the efficacy of employing a task-parallel model to mitigate the need for high-cost synchronization mechanisms in designs constructed on general FPGA computing platforms.

Assuntos

Benchmarking , Software , Humanos , Linguagens de Programação

3.

Biology System Description Language (BiSDL): a modeling language for the design of multicellular synthetic biological systems.

Giannantoni, Leonardo; Bardini, Roberta; Savino, Alessandro; Di Carlo, Stefano.

BMC Bioinformatics ; 25(1): 166, 2024 Apr 25.

Artigo em Inglês | MEDLINE | ID: mdl-38664639

RESUMO

BACKGROUND: The Biology System Description Language (BiSDL) is an accessible, easy-to-use computational language for multicellular synthetic biology. It allows synthetic biologists to represent spatiality and multi-level cellular dynamics inherent to multicellular designs, filling a gap in the state of the art. Developed for designing and simulating spatial, multicellular synthetic biological systems, BiSDL integrates high-level conceptual design with detailed low-level modeling, fostering collaboration in the Design-Build-Test-Learn cycle. BiSDL descriptions directly compile into Nets-Within-Nets (NWNs) models, offering a unique approach to spatial and hierarchical modeling in biological systems. RESULTS: BiSDL's effectiveness is showcased through three case studies on complex multicellular systems: a bacterial consortium, a synthetic morphogen system and a conjugative plasmid transfer process. These studies highlight the BiSDL proficiency in representing spatial interactions and multi-level cellular dynamics. The language facilitates the compilation of conceptual designs into detailed, simulatable models, leveraging the NWNs formalism. This enables intuitive modeling of complex biological systems, making advanced computational tools more accessible to a broader range of researchers. CONCLUSIONS: BiSDL represents a significant step forward in computational languages for synthetic biology, providing a sophisticated yet user-friendly tool for designing and simulating complex biological systems with an emphasis on spatiality and cellular dynamics. Its introduction has the potential to transform research and development in synthetic biology, allowing for deeper insights and novel applications in understanding and manipulating multicellular systems.

Assuntos

Biologia Sintética , Biologia Sintética/métodos , Modelos Biológicos , Linguagens de Programação , Biologia de Sistemas/métodos , Software

4.

ChatGPT-Enhanced ROC Analysis (CERA): A shiny web tool for finding optimal cutoff points in biomarker analysis.

Agraz, Melih; Mantzoros, Christos; Karniadakis, George Em.

PLoS One ; 19(4): e0289141, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38598521

RESUMO

Diagnostic tests play a crucial role in establishing the presence of a specific disease in an individual. Receiver Operating Characteristic (ROC) curve analyses are essential tools that provide performance metrics for diagnostic tests. Accurate determination of the cutoff point in ROC curve analyses is the most critical aspect of the process. A variety of methods have been developed to find the optimal cutoffs. Although the R programming language provides a variety of package programs for conducting ROC curve analysis and determining the appropriate cutoffs, it typically needs coding skills and a substantial investment of time. Specifically, the necessity for data preprocessing and analysis can present a significant challenge, especially for individuals without coding experience. We have developed the CERA (ChatGPT-Enhanced ROC Analysis) tool, a user-friendly ROC curve analysis web tool using the shiny interface for faster and more effective analyses to solve this problem. CERA is not only user-friendly, but it also interacts with ChatGPT, which interprets the outputs. This allows for an interpreted report generated by R-Markdown to be presented to the user, enhancing the accessibility and understanding of the analysis results.

Assuntos

Linguagens de Programação , Software , Humanos , Curva ROC , Biomarcadores

5.

Infection Control Through Clinical Pipelines Built with Arden Syntax MLM Building Blocks.

Hauptfeld, Leonhard; Rappelsberger, Andrea; Adlassnig, Klaus-Peter.

Stud Health Technol Inform ; 313: 167-172, 2024 Apr 26.

Artigo em Inglês | MEDLINE | ID: mdl-38682525

RESUMO

Healthcare-associated infections (HAIs) may have grave consequences for patients. In the case of sepsis, the 30-day mortality rate is about 25%. HAIs cost EU member states an estimated 7 billion Euros annually. Clinical decision support tools may be useful for infection monitoring, early warning, and alerts. MONI, a tool for monitoring nosocomial infections, is used at University Hospital Vienna, but needs to be clinically and technically revised and updated. A new, completely configurable pipeline-based system for defining and processing HAI definitions was developed and validated. A network of data access points, clinical rules, and explanatory output is arranged as an inference network, a clinical pipeline as it is called, and processed in a stepwise manner. Arden-Syntax-based medical logic modules were used to implement the respective rules. The system was validated by creating a pipeline for the ECDC PN5 pneumonia rule. It was tested on a set of patient data from intensive care medicine. The results were compared with previously obtained MONI output as a suitable reference, yielding a sensitivity of 93.8% and a specificity of 99.8%. Clinical pipelines show promise as an open and configurable approach to graphically-based, human-readable, machine-executable HAI definitions.

Assuntos

Infecção Hospitalar , Sistemas de Apoio a Decisões Clínicas , Humanos , Infecção Hospitalar/prevenção & controle , Controle de Infecções , Áustria , Linguagens de Programação , Software

6.

MCell4 with BioNetGen: A Monte Carlo simulator of rule-based reaction-diffusion systems with Python interface.

Husar, Adam; Ordyan, Mariam; Garcia, Guadalupe C; Yancey, Joel G; Saglam, Ali S; Faeder, James R; Bartol, Thomas M; Kennedy, Mary B; Sejnowski, Terrence J.

PLoS Comput Biol ; 20(4): e1011800, 2024 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-38656994

RESUMO

Biochemical signaling pathways in living cells are often highly organized into spatially segregated volumes, membranes, scaffolds, subcellular compartments, and organelles comprising small numbers of interacting molecules. At this level of granularity stochastic behavior dominates, well-mixed continuum approximations based on concentrations break down and a particle-based approach is more accurate and more efficient. We describe and validate a new version of the open-source MCell simulation program (MCell4), which supports generalized 3D Monte Carlo modeling of diffusion and chemical reaction of discrete molecules and macromolecular complexes in solution, on surfaces representing membranes, and combinations thereof. The main improvements in MCell4 compared to the previous versions, MCell3 and MCell3-R, include a Python interface and native BioNetGen reaction language (BNGL) support. MCell4's Python interface opens up completely new possibilities for interfacing with external simulators to allow creation of sophisticated event-driven multiscale/multiphysics simulations. The native BNGL support, implemented through a new open-source library libBNG (also introduced in this paper), provides the capability to run a given BNGL model spatially resolved in MCell4 and, with appropriate simplifying assumptions, also in the BioNetGen simulation environment, greatly accelerating and simplifying model validation and comparison.

Assuntos

Método de Monte Carlo , Software , Difusão , Simulação por Computador , Modelos Biológicos , Linguagens de Programação , Biologia Computacional/métodos , Transdução de Sinais/fisiologia

7.

Automated code development based on genetic programming in graphical programming language: A pilot study.

Kodytek, Pavel; Bodzas, Alexandra; Zidek, Jan.

PLoS One ; 19(3): e0299456, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38452131

RESUMO

Continual technological advances associated with the recent automation revolution have tremendously increased the impact of computer technology in the industry. Software development and testing are time-consuming processes, and the current market faces a lack of specialized experts. Introducing automation to this field could, therefore, improve software engineers' common workflow and decrease the time to market. Even though many code-generating algorithms have been proposed in textual-based programming languages, to the best of the authors' knowledge, none of the studies deals with the implementation of such algorithms in graphical programming environments, especially LabVIEW. Due to this fact, the main goal of this study is to conduct a proof-of-concept for a requirement-based automated code-developing system within the graphical programming environment LabVIEW. The proposed framework was evaluated on four basic benchmark problems, encompassing a string model, a numeric model, a boolean model and a mixed-type problem model, which covers fundamental programming scenarios. In all tested cases, the algorithm demonstrated an ability to create satisfying functional and errorless solutions that met all user-defined requirements. Even though the generated programs were burdened with redundant objects and were much more complex compared to programmer-developed codes, this fact has no effect on the code's execution speed or accuracy. Based on the achieved results, we can conclude that this pilot study not only proved the feasibility and viability of the proposed concept, but also showed promising results in solving linear and binary programming tasks. Furthermore, the results revealed that with further research, this poorly explored field could become a powerful tool not only for application developers but also for non-programmers and low-skilled users.

Assuntos

Linguagens de Programação , Software , Projetos Piloto , Algoritmos , Automação

8.

Computer programmers show distinct, expertise-dependent brain responses to violations in form and meaning when reading code.

Kuo, Chu-Hsuan; Prat, Chantel S.

Sci Rep ; 14(1): 5404, 2024 03 05.

Artigo em Inglês | MEDLINE | ID: mdl-38443678

RESUMO

As computer programming becomes more central to the workforce, the need for better models of how it is effectively learned has become more apparent. The current study addressed this gap by recording electrophysiological brain responses as 62 Python programmers with varying skill levels read lines of code with manipulations of form (syntax) and meaning (semantics). At the group level, results showed that manipulations of form resulted in P600 effects, with syntactically invalid code generating more positive deflections in the 500-800 ms range than syntactically valid code. Meaning manipulations resulted in N400 effects, with semantically implausible code generating more negative deflections in the 300-500 ms range than semantically plausible code. Greater Python expertise within the group was associated with greater sensitivity to violations in form. These results support the notion that skilled programming, like skilled natural language learning, is associated with the incorporation of rule-based knowledge into online comprehension processes. Conversely, programmers at all skill levels showed neural sensitivity to meaning manipulations, suggesting that reliance on pre-existing semantic relationships facilitates code comprehension across skill levels.

Assuntos

Encéfalo , Linguagens de Programação , Humanos , Encéfalo/fisiologia , Aprendizagem

9.

RDCanon: A Python Package for Canonicalizing the Order of Tokens in SMARTS Queries.

Mahjour, Babak A; Coley, Connor W.

J Chem Inf Model ; 64(8): 2948-2954, 2024 Apr 22.

Artigo em Inglês | MEDLINE | ID: mdl-38488634

RESUMO

SMARTS is a widely used language in cheminformatics for defining substructural queries for database lookups, reaction templates for chemical transformations, and other applications. As an extension to SMILES, many SMARTS patterns can represent the same query. Despite this, no canonicalization algorithm invariant of the line notation sequence or atomic numbering is publicly available. Here, we introduce RDCanon, an open-source Python package that can be used to standardize SMARTS queries. RDCanon is designed to ensure that the sequence of atomic queries remains consistent for all graphs representing the same substructure query and to ensure a canonical sequence of primitives within each individual atom query; furthermore, the algorithm can be applied to canonicalize the order of reactants, agents, and products and their atom map numbers in reaction SMARTS templates. As part of its canonicalization algorithm, RDCanon provides a mechanism in which the canonicalized SMARTS is optimized for speed against specific molecular databases. Several case studies are provided to showcase improved efficiency in substructure matching and retrosynthetic analysis.

Assuntos

Algoritmos , Software , Linguagens de Programação , Quimioinformática/métodos , Bases de Dados de Compostos Químicos

10.

Genetic Network Design Automation with LOICA.

Vidal, Gonzalo; Vitalis, Carolus; Matúte, Tamara; Núñez, Isaac; Federici, Fernán; Rudge, Timothy J.

Methods Mol Biol ; 2760: 393-412, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38468100

RESUMO

Genetic design automation (GDA) is the use of computer-aided design (CAD) in designing genetic networks. GDA tools are necessary to create more complex synthetic genetic networks in a high-throughput fashion. At the core of these tools is the abstraction of a hierarchy of standardized components. The components' input, output, and interactions must be captured and parametrized from relevant experimental data. Simulations of genetic networks should use those parameters and include the experimental context to be compared with the experimental results.This chapter introduces Logical Operators for Integrated Cell Algorithms (LOICA), a Python package used for designing, modeling, and characterizing genetic networks using a simple object-oriented design abstraction. LOICA represents different biological and experimental components as classes that interact to generate models. These models can be parametrized by direct connection to the Flapjack experimental data management platform to characterize abstracted components with experimental data. The models can be simulated using stochastic simulation algorithms or ordinary differential equations with varying noise levels. The simulated data can be managed and published using Flapjack alongside experimental data for comparison. LOICA genetic network designs can be represented as graphs and plotted as networks for visual inspection and serialized as Python objects or in the Synthetic Biology Open Language (SBOL) format for sharing and use in other designs.

Assuntos

Linguagens de Programação , Software , Redes Reguladoras de Genes , Algoritmos , Biologia Sintética/métodos , Automação

11.

No installation required: how WebAssembly is changing scientific computing.

Perkel, Jeffrey M.

Nature ; 627(8003): 455-456, 2024 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-38467881

Assuntos

Software , Interface Usuário-Computador , Software/tendências , Internet , Navegador , Linguagens de Programação

12.

Simplifying Multimodal Clinical Research Data Management: Introducing an Integrated and User-friendly Database Concept.

Schweinar, Anna; Wagner, Franziska; Klingner, Carsten; Festag, Sven; Spreckelsen, Cord; Brodoehl, Stefan.

Appl Clin Inform ; 15(2): 234-249, 2024 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-38301729

RESUMO

BACKGROUND: Clinical research, particularly in scientific data, grapples with the efficient management of multimodal and longitudinal clinical data. Especially in neuroscience, the volume of heterogeneous longitudinal data challenges researchers. While current research data management systems offer rich functionality, they suffer from architectural complexity that makes them difficult to install and maintain and require extensive user training. OBJECTIVES: The focus is the development and presentation of a data management approach specifically tailored for clinical researchers involved in active patient care, especially in the neuroscientific environment of German university hospitals. Our design considers the implementation of FAIR (Findable, Accessible, Interoperable, and Reusable) principles and the secure handling of sensitive data in compliance with the General Data Protection Regulation. METHODS: We introduce a streamlined database concept, featuring an intuitive graphical interface built on Hypertext Markup Language revision 5 (HTML5)/Cascading Style Sheets (CSS) technology. The system can be effortlessly deployed within local networks, that is, in Microsoft Windows 10 environments. Our design incorporates FAIR principles for effective data management. Moreover, we have streamlined data interchange through established standards like HL7 Clinical Document Architecture (CDA). To ensure data integrity, we have integrated real-time validation mechanisms that cover data type, plausibility, and Clinical Quality Language logic during data import and entry. RESULTS: We have developed and evaluated our concept with clinicians using a sample dataset of subjects who visited our memory clinic over a 3-year period and collected several multimodal clinical parameters. A notable advantage is the unified data matrix, which simplifies data aggregation, anonymization, and export. THIS STREAMLINES DATA EXCHANGE AND ENHANCES DATABASE INTEGRATION WITH PLATFORMS LIKE KONSTANZ INFORMATION MINER (KNIME): . CONCLUSION: Our approach offers a significant advancement for capturing and managing clinical research data, specifically tailored for small-scale initiatives operating within limited information technology (IT) infrastructures. It is designed for immediate, hassle-free deployment by clinicians and researchers.The database template and precompiled versions of the user interface are available at: https://github.com/stebro01/research_database_sqlite_i2b2.git.

Assuntos

Gerenciamento de Dados , Linguagens de Programação , Humanos

13.

Open-source milligram-scale, four channel, automated protein purification system.

Puccinelli, Robert R; Sama, Samia S; Worthington, Caroline M; Puschnik, Andreas S; Pak, John E; Gómez-Sjöberg, Rafael.

PLoS One ; 19(2): e0297879, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38394072

RESUMO

Liquid chromatography purification of multiple recombinant proteins, in parallel, could catalyze research and discovery if the processes are fast and approach the robustness of traditional, "one-protein-at-a-time" purification. Here, we report an automated, four channel chromatography platform that we have designed and validated for parallelized protein purification at milligram scales. The device can purify up to four proteins (each with its own single column), has inputs for up to eight buffers or solvents that can be directed to any of the four columns via a network of software-driven valves, and includes an automated fraction collector with ten positions for 1.5 or 5.0 mL collection tubes and four positions for 50 mL collection tubes for each column output. The control software can be accessed either via Python scripting, giving users full access to all steps of the purification process, or via a simple-to-navigate touch screen graphical user interface that does not require knowledge of the command line or any programming language. Using our instrument, we report milligram-scale, parallelized, single-column purification of a panel of mammalian cell expressed coronavirus (SARS-CoV-2, HCoV-229E, HCoV-OC43, HCoV-229E) trimeric Spike and monomeric Receptor Binding Domain (RBD) antigens, and monoclonal antibodies targeting SARS-CoV-2 Spike (S) and Influenza Hemagglutinin (HA). We include a detailed hardware build guide, and have made the controlling software open source, to allow others to build and customize their own protein purifier systems.

Assuntos

Coronavirus Humano 229E , Coronavirus Humano OC43 , Animais , SARS-CoV-2 , Proteínas Recombinantes/metabolismo , Software , Linguagens de Programação , Glicoproteína da Espícula de Coronavírus/metabolismo , Mamíferos

14.

A systematic literature review on the applications of recurrent neural networks in code clone research.

Quradaa, Fahmi H; Shahzad, Sara; Almoqbily, Rashad S.

PLoS One ; 19(2): e0296858, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38306372

RESUMO

Code clones, referring to code fragments that are either similar or identical and are copied and pasted within software systems, have negative effects on both software quality and maintenance. The objective of this work is to systematically review and analyze recurrent neural network techniques used to detect code clones to shed light on the current techniques and offer valuable knowledge to the research community. Upon applying the review protocol, we have successfully identified 20 primary studies within this field from a total of 2099 studies. A deep investigation of these studies reveals that nine recurrent neural network techniques have been utilized for code clone detection, with a notable preference for LSTM techniques. These techniques have demonstrated their efficacy in detecting both syntactic and semantic clones, often utilizing abstract syntax trees for source code representation. Moreover, we observed that most studies applied evaluation metrics like F-score, precision, and recall. Additionally, these studies frequently utilized datasets extracted from open-source systems coded in Java and C programming languages. Notably, the Graph-LSTM technique exhibited superior performance. PyTorch and TensorFlow emerged as popular tools for implementing RNN models. To advance code clone detection research, further exploration of techniques like parallel LSTM, sentence-level LSTM, and Tree-Structured GRU is imperative. In addition, more research is needed to investigate the capabilities of the recurrent neural network techniques for identifying semantic clones across different programming languages and binary codes. The development of standardized benchmarks for languages like Python, Scratch, and C#, along with cross-language comparisons, is essential. Therefore, the utilization of recurrent neural network techniques for clone identification is a promising area that demands further research.

Assuntos

Redes Neurais de Computação , Software , Linguagens de Programação , Idioma , Semântica

15.

SimService: a lightweight library for building simulation services in Python.

Sego, T J.

Bioinformatics ; 40(1)2024 01 02.

Artigo em Inglês | MEDLINE | ID: mdl-38237907

RESUMO

SUMMARY: Integrative biological modeling requires software infrastructure to launch, interconnect, and execute simulation software components without loss of functionality. SimService is a software library that enables deploying simulations in integrated applications as memory-isolated services with interactive proxy objects in the Python programming language. SimService supports customizing the interface of proxies so that simulation developers and users alike can tailor generated simulation instances according to model, method, and integrated application. AVAILABILITY AND IMPLEMENTATION: SimService is written in Python, is freely available on GitHub under the MIT license at https://github.com/tjsego/simservice, and is available for download via the Python Package Index (package name "simservice") and conda (package name "simservice" on the conda-forge channel).

Assuntos

Linguagens de Programação , Software , Simulação por Computador , Biblioteca Gênica

16.

vcfpp: a C++ API for rapid processing of the variant call format.

Li, Zilong.

Bioinformatics ; 40(2)2024 02 01.

Artigo em Inglês | MEDLINE | ID: mdl-38273677

RESUMO

MOTIVATION: Given the widespread use of the variant call format (VCF/BCF) coupled with continuous surge in big data, there remains a perpetual demand for fast and flexible methods to manipulate these comprehensive formats across various programming languages. RESULTS: This work presents vcfpp, a C++ API of HTSlib in a single file, providing an intuitive interface to manipulate VCF/BCF files rapidly and safely, in addition to being portable. Moreover, this work introduces the vcfppR package to demonstrate the development of a high-performance R package with vcfpp, allowing for rapid and straightforward variants analyses. AVAILABILITY AND IMPLEMENTATION: vcfpp is available from https://github.com/Zilong-Li/vcfpp under MIT license. vcfppR is available from https://cran.r-project.org/web/packages/vcfppR.

Assuntos

Linguagens de Programação , Software , Big Data

17.

ReUseData: an R/Bioconductor tool for reusable and reproducible genomic data management.

Liu, Qian; Hu, Qiang; Liu, Song; Hutson, Alan; Morgan, Martin.

BMC Bioinformatics ; 25(1): 8, 2024 Jan 03.

Artigo em Inglês | MEDLINE | ID: mdl-38172657

RESUMO

BACKGROUND: The increasing volume and complexity of genomic data pose significant challenges for effective data management and reuse. Public genomic data often undergo similar preprocessing across projects, leading to redundant or inconsistent datasets and inefficient use of computing resources. This is especially pertinent for bioinformaticians engaged in multiple projects. Tools have been created to address challenges in managing and accessing curated genomic datasets, however, the practical utility of such tools becomes especially beneficial for users who seek to work with specific types of data or are technically inclined toward a particular programming language. Currently, there exists a gap in the availability of an R-specific solution for efficient data management and versatile data reuse. RESULTS: Here we present ReUseData, an R software tool that overcomes some of the limitations of existing solutions and provides a versatile and reproducible approach to effective data management within R. ReUseData facilitates the transformation of ad hoc scripts for data preprocessing into Common Workflow Language (CWL)-based data recipes, allowing for the reproducible generation of curated data files in their generic formats. The data recipes are standardized and self-contained, enabling them to be easily portable and reproducible across various computing platforms. ReUseData also streamlines the reuse of curated data files and their integration into downstream analysis tools and workflows with different frameworks. CONCLUSIONS: ReUseData provides a reliable and reproducible approach for genomic data management within the R environment to enhance the accessibility and reusability of genomic data. The package is available at Bioconductor ( https://bioconductor.org/packages/ReUseData/ ) with additional information on the project website ( https://rcwl.org/dataRecipes/ ).

Assuntos

Gerenciamento de Dados , Genômica , Software , Linguagens de Programação , Fluxo de Trabalho

18.

cytoviewer: an R/Bioconductor package for interactive visualization and exploration of highly multiplexed imaging data.

Meyer, Lasse; Eling, Nils; Bodenmiller, Bernd.

BMC Bioinformatics ; 25(1): 9, 2024 Jan 03.

Artigo em Inglês | MEDLINE | ID: mdl-38172724

RESUMO

BACKGROUND: Highly multiplexed imaging enables single-cell-resolved detection of numerous biological molecules in their spatial tissue context. Interactive visualization of multiplexed imaging data is crucial at any step of data analysis to facilitate quality control and the spatial exploration of single cell features. However, tools for interactive visualization of multiplexed imaging data are not available in the statistical programming language R. RESULTS: Here, we describe cytoviewer, an R/Bioconductor package for interactive visualization and exploration of multi-channel images and segmentation masks. The cytoviewer package supports flexible generation of image composites, allows side-by-side visualization of single channels, and facilitates the spatial visualization of single-cell data in the form of segmentation masks. As such, cytoviewer improves image and segmentation quality control, the visualization of cell phenotyping results and qualitative validation of hypothesis at any step of data analysis. The package operates on standard data classes of the Bioconductor project and therefore integrates with an extensive framework for single-cell and image analysis. The graphical user interface allows intuitive navigation and little coding experience is required to use the package. We showcase the functionality and biological application of cytoviewer by analysis of an imaging mass cytometry dataset acquired from cancer samples. CONCLUSIONS: The cytoviewer package offers a rich set of features for highly multiplexed imaging data visualization in R that seamlessly integrates with the workflow for image and single-cell data analysis. It can be installed from Bioconductor via https://www.bioconductor.org/packages/release/bioc/html/cytoviewer.html . The development version and further instructions can be found on GitHub at https://github.com/BodenmillerGroup/cytoviewer .

Assuntos

Neoplasias , Software , Humanos , Linguagens de Programação , Processamento de Imagem Assistida por Computador

19.

R programming environment in wildlife: Are Veterinary Sciences at the same level than other research areas?

Gonzálvez, Moisés; Muñoz-Hernández, Clara.

Res Vet Sci ; 166: 105079, 2024 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-37963421

RESUMO

The computing environment has revolutionized the management and analysis of data in sciences during the last decades. This study aimed to evaluate the use of R software in research articles addressing the study of wildlife worldwide, particularly focusing on the research area "Veterinary Sciences". For this purpose, a systematic review mainly performed in the Web of Science database was conducted. Out of a total of 509 articles reviewed, our results show an increasing trend of the number of publications using the R software over time, as well as a wide geographical distribution at a global scale, particularly in North America, Europe, Australia and China. Most publications were categorized in research areas related to "Biological Sciences", while a minority of them was included in "Veterinary Sciences" (5.9%; 30/509). About the species groups assessed, many articles evaluated a single species group (96.5%), being mammals (50.7%) and birds (14.8%) the most studied ones. The present study showed a high variety of R-packages used in the publications reviewed, all of them related to data analysis, the study of genetic/phylogenetic information and graphical representation. Interestingly, the common use of packages between different research areas is indicative of the high interest of using R software in scientific articles. Our study points the R software as an open-source programming language that allows to support research addressing the study of wildlife, becoming a key software for many research areas, including "Veterinary Sciences". However, an in-depth methodological description about the use of R software in publications to improve the tracking, reproducibility and transparency is encouraged.

Assuntos

Animais Selvagens , Software , Animais , Filogenia , Reprodutibilidade dos Testes , Linguagens de Programação , Mamíferos

20.

CodonU: A Python Package for Codon Usage Analysis.

Choudhuri, Souradipto; Sau, Keya.

IEEE/ACM Trans Comput Biol Bioinform ; 21(1): 36-44, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38015670

RESUMO

Codon Usage Analysis (CUA) has been accompanied by several web servers and independent programs written in several programming languages. Also this diversity speaks for the need of a reusable software that can be helpful in reading, manipulating and acting as a pipeline for such data and file formats. This kind of analyses use multiple tools to address the multifaceted aspects of CUA. So, we propose CodonU, a package written in Python language to integrate all aspects. It is compatible with existing file formats and can be used solely or with a group of other such packages. The proposed package incorporates various statistical measures necessary for codon usage analysis. The measures vary with nature of the sequences, viz. for nucleotide, codon adaptation index (CAI), codon bias index (CBI), tRNA adaptation index (tAI) etc. and for protein sequences Gravy score etc. Users can also perform the correspondence analysis (COA). This package also provides the liberty to generate graphics to users, and also develop phylogenetic tree. Capabilities of the proposed package were checked thoroughly on a genomic set of Staphylococcus aureus.

Assuntos

Uso do Códon , Software , Filogenia , Linguagens de Programação , Códon/genética

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA